Automatic Paragraph Segmentation with Lexical and Prosodic Features

نویسندگان

  • Catherine Lai
  • Mireia Farrús
  • Johanna D. Moore
چکیده

As long-form spoken documents become more ubiquitous in everyday life, so does the need for automatic discourse segmentation in spoken language processing tasks. Although previous work has focused on broad topic segmentation, detection of finer-grained discourse units, such as paragraphs, is highly desirable for presenting and analyzing spoken content. To better understand how different aspects of speech cue these subtle discourse transitions, we investigate automatic paragraph segmentation of TED talks. We build lexical and prosodic paragraph segmenters using Support Vector Machines, AdaBoost, and Long Short Term Memory (LSTM) recurrent neural networks. In general, we find that induced cue words and supra-sentential prosodic features outperform features based on topical coherence, syntactic form and complexity. However, our best performance is achieved by combining a wide range of individually weak lexical and prosodic features, with the sequence modelling LSTM generally outperforming the other classifiers by a large margin. Moreover, we find that models that allow lower level interactions between different feature types produce better results than treating lexical and prosodic contributions as separate, independent information sources.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hiddenMarkov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We e...

متن کامل

Combining Words and Speech Prosody for Automatic Topic Segmentation

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach o...

متن کامل

How far can prosodic cues help in word segmentation?

Prosodic cues are of great importance in parsing speech signal into prosodic and lexical units. Listeners detect the changes of the prosodic parameters and interpret them to detect sentence modalities or the mood of the speaker. Some automatic speech recognition systems try to use prosodic parameters to detect boundaries of prosodic units and help thus the acoustic decoding process. Although th...

متن کامل

Automatic Labelling of Prosodic Prominence, Phrasing and Disfluencies in French Speech by Simulating the Perception of Naïve and Expert Listeners

We explore the use of machine learning techniques (notably SVM classifiers and Conditional Random Fields) to automate the prosodic labelling of French speech, based on modelling and simulating the perception of prosodic events by naı̈ve and expert listeners. The models are based on previous work on the perception of syllabic prominence and hesitation-related disfluencies, and on an experiment on...

متن کامل

Assessing Prosodic And Text Features For Segmentation Of Mandarin Broadcast News

Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic segmentation of other languages...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016